Nearest-neighbour Searching in Files of Text Signatures Using Transputer Networks

نویسندگان

  • Janey K. Cringean
  • Roger England
  • Gordon A. Manson
  • Peter Willett
چکیده

This paper discusses the implementation of nearest-neighbour document retrieval in serial files using transputer networks. The system uses a two-stage retrieval algorithm in which an initial text-signature search is used to exclude large numbers of documents from the detailed and time-consuming pattern-matching search. The latter is implemented using a processor farm, so that documents which match at the signature level can be examined in parallel to determine whether they are, in fact, a good match for the query. The results demonstrate that communication is the critical factor in all of the transputer networks that were investigated. A high degree of speed-up can be obtained when only the pattern-matching search is carried out. When text signatures are used, however, the speed-up is less, decreasing in line with an increase in the size of the text signatures that are used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paragraph-based nearest neighbour searching in full-text documents

This paper discusses the searching of full-text documents to identify paragraphs that are relevant to a user request. Given a natural language query statement, a nearest neighbour search involves ranking the paragraphs comprising a full-text document in order of descending similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Augmenting Approximate Similarity Searching with Lexical Information

Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naı̈ve nearest-neighbour approach to compare context vectors extracted from large corpora scales poorly. The Spatial Approximation Sample Hierarchy (SASH) is a data-structure for performing approximate nearest-neighbour queries, and has been previou...

متن کامل

Editorial: special issue on information retrieval

Information is continuing to grow exponentially and the increasing utilization of electronic and optical publishing technologies is making available large machine-readable document collections. There is a strong need for sophisticated and innovative retrieval systems which can provide satisfactory access to such amounts of stored information. Information Retrieval (IR) has been developing from ...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Electronic Publishing

دوره 4  شماره 

صفحات  -

تاریخ انتشار 1991